Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242

avtikhon · 2020-11-24T13:47:17Z

Move tarantoolctl to test-run tool submodule

Moved tarantoolctl to test-run tool submodule repository as:

/test-run/.tarantoolctl

Also set backward compability to use old path location:

/test/.tarantoolctl

as its primary place.

To set PWD environment for tarantoolctl for popen() call tried to use
os.environ. It was needed for the later commits where tarantoolctl
would be moved from to test-run repository and its configuration file
would be read right from the working directory of the test. In this
case all os.putenv() calls changed to environment set with os.environ,
to be able to modify environment directly, check [1] ("Note: Calling
putenv() directly does not change os.environ"). But in this case all
the tests had to be changed to use os.environ instead of os.putenv(),
it will block the use of os.putenv() in the future tests. Decided to
avoid of this way and temporary change 'PWD' around the Popen() call.

Needed for test: flaky hang tests tarantool#5504
Part of Simplify configuration needed for setup tap / tarantool tests #78

[1] - https://docs.python.org/2/library/os.html#process-parameters
Setup replication_sync_timeout at .tarantoolctl

Found that tests may fail due to hang in seek_wait() loop on starting
and stopping instances. It happened, because instances were not synced
within default output timeout which is by default 120 seconds, while
replication sync happens only each 300 seconds by default. To fix it
replication_sync_timeout should be decreased to some value lower than
'no-output-timeout', so decided to set it to 100 seconds.

The issue looked like in tests:
```
  --- replication/gh-5140-qsync-casc-rollback.result    Mon Oct 19
  17:29:46 2020
  +++ /rw_bins/test/var/070_replication/gh-5140-qsync-casc-rollback.result
  Mon Oct 19 17:31:35 2020
  @@ -169,56 +169,3 @@
   -- all the records are replayed one be one without yields for WAL writes, and
   -- nothing should change.
   test_run:cmd('restart server default')
  - |
  -test_run:cmd('restart server replica')
  - | ---
  - | - true
  - | ...
  -
```
Part of test: flaky hang tests tarantool#5504

Totktonada · 2020-11-28T03:41:50Z

Please, mark the first commit as part of #78.

.tarantoolctl

lib/tarantool_server.py

Totktonada · 2020-11-30T01:50:49Z

Please, also mention the issue in tarantool that motivates those changes.

avtikhon · 2020-11-30T06:38:07Z

Please, also mention the issue in tarantool that motivates those changes.

Made all the changes as you suggested, please review.

lib/tarantool_server.py

Totktonada · 2020-12-05T00:37:58Z

Made several changes:

Moved the comment re .tarantoolctl file search into the appropriate commit.
Renamed repl_sync_timeout to replication_sync_timeout. There is no need to abbreviate it.
Extracted PWD tweaking into its own commit, because it deserves its own explanation and may be considered as the bug fix on its own. Used os.environ here, because why not. Always set PWD to actual cwd with os.getcwd(), because, again, why not.
Fixed misusage of 'tarantoolctl' word, where '.tarantoolctl' configuration file was meant.
Rebased on top of the current master.

Totktonada

I made some tweaks and now everything looks okay for me. I think we both spin around here enough time and the change can be pushed. I verified in-source and out-of-source builds before and after the corresponding tarantool patch and everything looks okay too.

Totktonada · 2020-12-05T01:05:45Z

Found a nice side effect: now we can run ./test/test-run.py and cd test && ./test-run.py is not mandatory anymore. I'll add a note about this into the commit message.

test-run copies .tarantooctl file from a test/ directory to a worker's working directory (e.g. test/var/001_app). It would be quite intuitive if tarantoolctl would use this copy when operating on tarantool instances of the worker. Suprisingly, it is not so. tarantoolctl checks whether the file exists inside a directory pointed by the PWD environment variable, not a real current working directory. test-run did not set PWD prior to this commit, however everything work. Why? Because test-run.py is always executed from the test/ directory, which contains the same .tarantoolctl file. Even `make test` / `make test-force` targets issue commands like `cd <...>/test && <...>/test-run.py <...>` (where `cd` sets PWD). The time has come. Let's set PWD and use test/var/001_app/.tarantoolctl configuration for instances of the 001_app worker. Side note: it would be good to fix tarantoolctl too to use a real cwd, not PWD. This inconsistency was spotted, when we perform attempt to remove the test/.tarantoolctl config from the main tarantool repository, move it to the test-run repository, but keep copying into a worker directory. As a side effect, test-run now could be invoked without manual changing a current directory to test/. So, where we run a command like this (example for in-source build): | (cd test && ./test-run.py) The following command can be used: | ./test/test-run.py Nice! Sadly, the trick `Popen(<...>, env=dict(os.environ, PWD=self.vardir))` does not work for us, because we use os.putenv() in test-run and Python tests. os.putenv() does not update os.environ, so we would feed non-actual environment to the subprocess. Because of this, we just set PWD before and after Popen() call. Part of tarantool/tarantool#5504 Part of #78 Co-authored-by: Alexander V. Tikhonov <[email protected]>

Moved .tarantoolctl to test-run tool submodule repository as: <tarantool repository>/test-run/.tarantoolctl Also set backward compability to use old path location: <tarantool repository>/test/.tarantoolctl as its primary place. Needed for tarantool/tarantool#5504 Part of #78

Found that tests may fail due to hang in seek_wait() loop on starting and stopping instances. It happened, because instances were not synced within default output timeout which is by default 120 seconds, while replication sync happens only each 300 seconds by default. To fix it replication_sync_timeout should be decreased to some value lower than 'no-output-timeout', so decided to set it to 100 seconds. The issue looked like in tests: --- replication/gh-5140-qsync-casc-rollback.result Mon Oct 19 17:29:46 2020 +++ /rw_bins/test/var/070_replication/gh-5140-qsync-casc-rollback.result Mon Oct 19 17:31:35 2020 @@ -169,56 +169,3 @@ -- all the records are replayed one be one without yields for WAL writes, and -- nothing should change. test_run:cmd('restart server default') - | -test_run:cmd('restart server replica') - | --- - | - true - | ... - Part of tarantool/tarantool#5504

Totktonada · 2020-12-05T23:48:46Z

(Regarding the last force-push.) Fixed a typo in a commit message.

See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504

See commits in the PR [1] for detailed description of the changes. User visible changes are the following. 1. Now test-run.py can be invoked from any directory without changing a current working directory to `test/`. 2. The `test/.tarantoolctl` configuration file is not mandatory and can be removed. It is shipped now within the test-run repository. 3. test-run sets the `replication_sync_timeout` box.cfg() option when the `test/.tarantoolctl` is not present in a parent repository. The value is controlled by the --replication-sync-timeout argument and defaults to 100 seconds (unlike tarantool's default, which is 300 seconds). The reason of the changes is to set default `replication_sync_timeout` for all tests to a value lower than `--no-output-timeout` (120 seconds) to allow instances to step into the orphan mode before this deadline and see more descriptive picture when it leads to failure of a test. What is also important, when a test fails before the `--no-output-timeout`, we able to restart it based on the `fragile` suite.ini option and / or collect artifacts to store them in CI. The `--no-output-timeout` deadline remains the show-stopper. We'll introduce a test execution timeout later to step into the general `--no-output-timeout` only in quite rare and unusual cases. The next commit will actually remove `test/.tarantoolctl`, so the new `replication_sync_timeout` will be in effect. [1]: tarantool/test-run#242 Part of #5504 (cherry picked from commit 6c8efa8)

Totktonada · 2020-12-06T00:52:13Z

Updated the test-run submodule in tarantool in the following commits: 2.7.0-97-g6c8efa849, 2.6.1-84-g0d738037b, 2.5.2-48-gfa36857a3, 1.10.8-33-g64e2fb849.

Totktonada · 2020-12-06T01:50:57Z

Removed the test/.tarantoolctl file from the tarantool repository in the following commits: 2.7.0-98-g54667f6fa, 2.6.1-85-g8d4623652, 2.5.2-49-gabd4660df, 1.10.8-34-gb5d31b60a. After this the new --replication-sync-timeout option (and its default) will be in effect.

avtikhon requested a review from Totktonada November 24, 2020 13:47

avtikhon self-assigned this Nov 24, 2020

avtikhon mentioned this pull request Nov 27, 2020

Add the 'timeout' option into 'stop server' command and kill a server… #186

Merged

Totktonada reviewed Nov 29, 2020

View reviewed changes

.tarantoolctl Outdated Show resolved Hide resolved

Totktonada reviewed Nov 30, 2020

View reviewed changes

lib/tarantool_server.py Show resolved Hide resolved

avtikhon force-pushed the avtikhon/move_tarantoolctl_config branch from 1638eaa to 4320eff Compare November 30, 2020 06:34

avtikhon requested a review from Totktonada November 30, 2020 06:37

avtikhon force-pushed the avtikhon/move_tarantoolctl_config branch 2 times, most recently from e4fc9fb to 944f987 Compare November 30, 2020 07:03

avtikhon added [3sp] labels Nov 30, 2020

avtikhon force-pushed the avtikhon/move_tarantoolctl_config branch 2 times, most recently from b190e0e to 8928998 Compare December 3, 2020 06:32

Totktonada mentioned this pull request Dec 4, 2020

Simplify configuration needed for setup tap / tarantool tests #78

Open

3 tasks

Totktonada reviewed Dec 4, 2020

View reviewed changes

lib/tarantool_server.py Outdated Show resolved Hide resolved

Totktonada force-pushed the avtikhon/move_tarantoolctl_config branch from 8928998 to 3a48123 Compare December 5, 2020 00:36

Totktonada approved these changes Dec 5, 2020

View reviewed changes

Totktonada force-pushed the avtikhon/move_tarantoolctl_config branch from 3a48123 to 976c836 Compare December 5, 2020 01:06

Totktonada and others added 3 commits December 6, 2020 02:46

Totktonada force-pushed the avtikhon/move_tarantoolctl_config branch from 976c836 to d36db6c Compare December 5, 2020 23:47

Totktonada merged commit e843552 into master Dec 5, 2020

Totktonada deleted the avtikhon/move_tarantoolctl_config branch December 5, 2020 23:49

Totktonada mentioned this pull request Dec 6, 2020

Give a warning when configured timeouts look strange #250

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242

Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242

avtikhon commented Nov 24, 2020 •

edited

Loading

Totktonada commented Nov 28, 2020

Totktonada commented Nov 30, 2020

avtikhon commented Nov 30, 2020

Totktonada commented Dec 5, 2020

Totktonada left a comment

Totktonada commented Dec 5, 2020

Totktonada commented Dec 5, 2020

Totktonada commented Dec 6, 2020

Totktonada commented Dec 6, 2020

Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242

Move tarantoolctl to test-run tool submodule and add replication_sync_timeout to it #242

Conversation

avtikhon commented Nov 24, 2020 • edited Loading

Totktonada commented Nov 28, 2020

Totktonada commented Nov 30, 2020

avtikhon commented Nov 30, 2020

Totktonada commented Dec 5, 2020

Totktonada left a comment

Choose a reason for hiding this comment

Totktonada commented Dec 5, 2020

Totktonada commented Dec 5, 2020

Totktonada commented Dec 6, 2020

Totktonada commented Dec 6, 2020

avtikhon commented Nov 24, 2020 •

edited

Loading